Secure search method and secure search device

ABSTRACT

In search that uses searchable code, the search query and the secure index is collated in round-robins fashion, registering the required secure index and a characteristic quantity of deposited data in a database server to realize the searchable code. The server uses the characteristic quantity to perform clustering on the secure index. For search, collation is first performed only for representative data of a cluster. For a hit, the collation priority is raised for all the data included in the cluster. When there is no hit, the priority is lowered. After calculating the priority, collation is performed sequentially for all the data on the basis of the priority.

TECHNICAL FIELD

The present invention relates to a secure search system using asearchable code that searches encrypted data deposited in a serverwithout decrypting in a client server model such as cloud computing.

BACKGROUND ART

The deposition of information in a database server is actively used bythe popularization of cloud computing. In the meantime, the leakage ofconfidential information such as personal information is also becoming agreat social problem.

To securely deposit information in a database server while preventingthe leakage of the information, searchable code technique that enablessearching deposited data in an encrypted state is proposed. Informationcan be prevented from leaking to not only a third party on a channel buta database server manager by using a searchable code.

For the searchable code technique, various methods are proposed. Asearch using a searchable code is generally performed according to thefollowing procedure.

(1) A client that deposits data calculates an index representingcontents of the deposited data and secures it. In this case, securingmeans processing proper to a searchable code which makes it difficult toacquire the contents of the deposited data based upon the correspondingindex. Hereinafter, the index that is secured is called secure index.

(2) The client encrypts the deposited data (hereinafter called encrypteddata) and transmits it to a database server together with the secureindex.

(3) The database server registers a pair of the encrypted data and thesecure index in a database.

(4) The search client that searches the data calculates a trapdoor of akeyword (a search query) to be searched. In this case, the trapdoormeans information for a search and especially, means the secure keywordfor a search included in the search query.

(5) The search client transmits the trapdoor to the database server.

(6) The database server searches data that hits the search query bycollating the secure index registered in the database and the trapdoorin a procedure proper to the searchable code.

(7) The database server transmits encrypted data corresponding to thehit secure index and the like to the search client.

(8) The search client specifies a client that deposits data based upon areceived search result and shares a decrypting key with thecorresponding client.

(9) The search client decrypts the encrypted data received from thedatabase server using the shared key.

As the deposited data is encrypted, it is substantially impossible thata database server manager decrypts the deposited data. In addition,since the index is secured, it is difficult to extract contents of thedeposited data based upon the index. Further, since the search query isconverted to the trapdoor, the possibility of the leakage of the searchquery is also low. Further, since it is also difficult to determinewhether different secure indexes include the same keyword or not, unjustattack such as frequency analysis that estimates an unciphered textbased upon the frequency of appearance of a word can be prevented. Asdescribed above, information can be substantially prevented from leakingnot only to a third party on a channel but to the database servermanager and the like by using searchable code technique.

For the searchable code technique, Non-patent Literature 1 andNon-patent Literature 2 are known for example. These methods adopt arandom encryption method in which an unciphered text and its encryptedtext have the complex correlation of 1 to m and which is securer than adeterministic encryption method in which an unciphered text and itsencrypted text have the simple correlation of 1 to 1. These methods arerelatively secure from attack such as frequency analysis.

In addition, Non-patent Literature 3, Non-patent Literature 4 and PatentLiterature 1 are also known. In methods disclosed in Non-patentLiterature 3 and Non-patent Literature 4, tolerance to attack such asfrequency analysis is also applied by utilizing Bloom filter which isone of random data structure. In a method disclosed in Patent Literature1, tolerance to attack such as frequency analysis is also applied byusing Fuzzy Vault Scheme for realizing fuzzy collation between clustersusing an error-correcting code.

The techniques disclosed in Non-patent Literatures 1 to 4 and PatentLiterature 1 guarantee security from frequency analysis by utilizingrandom encryption, random data structure, fuzzy collation technique andthe like. For a concrete example, when plural data pieces including akeyword “cloud” are deposited in the database server, a correspondingsecure index is different for each deposited data piece. Further, it isdifficult to determine that the secure indexes include the same keyword“cloud”. Furthermore, even if a search is made based upon “cloud”, it isdifficult to guess a search query “cloud” based upon a trapdoor.Therefore, even if a fact that the search query is hit is known, thedatabase server manager cannot substantially know whether the secureindex includes “cloud” or not.

CITATION LIST Patent Literature

-   Patent Literature 1: JP-A No. 2009-271584

Non-Patent Literature

-   Non-patent Literature 1: Dawn Xiaodong Song, David Wagner and Arian    Perrig: “Practical Techniques for Searches on Encrypted Data,” In    Proceedings of the 2000 IEEE Symposium on Security and Privacy, pp.    44-55 (2000).-   Non-patent Literature 2: Zhigiang Yang, Sheng Zhong, Rebecca N.    Wright: “Privacy-Preserving Queries on Encrypted Data,” In    Proceedings of the 11th European Symposium on Research in Computer    Security (Esorics), Vol. 4189 of Lecture Notes in Computer Science,    pp. 476-495 (2006).-   Non-patent Literature 3: Eu-Jin Goh: “Secure Indexes,” Cryptology    ePrint Archive, Report 2003/216 (2003).-   Non-patent Literature 4: Takanori Kan, Takashi Nishide, Yoshiaki    Hori, Koichi Sakurai: “Design Implementation and Evaluation of    Symmetric Key Encryption with Flexible Keyword Search by Using Bloom    Filter”, IEICE technical report Vol. 111, No. 30, pp. 111-116    (2011).-   Non-patent Literature 5: A. D. Bimbo: “Visual Information    Retrieval”, Morgan Kaufmann Publishers (1999).-   Non-patent Literature 6: Susumu Serita, Yasuhiro Fujii, Masaru Kai,    Takao Murakami, Yoshinori Honda: “Similarity Hashing resistant to    file modifications”, IEICE technical report Vol. 110, No. 282, pp.    31-36 (2010).-   Non-patent Literature 7: C. M. Bishop: “Pattern Recognition and    Machine Learning”, Springer Japan Inc. (2007).-   Non-patent Literature 8: F. Murtagh: “A Survey of Recent Advances in    Hierarchical Clustering Algorithms”, The Computer Journal, vol. 26,    pp. 354-359 (1983).

SUMMARY OF INVENTION Technical Problem

Generally, in a search of a character string, search response time isreduced by providing a word and an index (an inverted index) of adocument including the word. When no index exists, a search query anddeposited data are required to be collated in a round-robin fashionevery time and a search response is greatly delayed.

In techniques disclosed in Non-patent Literatures 1 to 4 and in JP-A No.2009-271584, as tolerance to attack such as frequency analysis isapplied, it is difficult to determine what word a secure index includes.That is, it is substantially impossible to configure an index such as aninverted index. Therefore, in these techniques, a search query and asecure index are required to be collated in round-robin fashion everytime and a search response is greatly displayed. An object of thepresent invention is to accelerate a search in a secure search systemusing a searchable code.

Solution to Problem

To achieve the object, the present invention provides means forregistering not only encrypted data and a secure index butcharacteristic quantity of deposited data in a database server. In thiscase, the characteristic quantity means data length which is greatlyreduced to minimize impairing of characteristics of deposited data andthe similarity of deposited data can be calculated using onlycharacteristic quantity. However, it is defined difficult to guessoriginal data based upon characteristic quantity. For characteristicquantity, a characteristic vector calculated based upon a word and thelike in deposited data and quantity called a fuzzy hash acquired bydividing deposited data, calculating a hash value and connecting thehash values are known.

Next, the present invention provides means for calculating thesimilarity of deposited data corresponding to a characteristic quantityon the side of the database server using the characteristic quantityreceived together with encrypted data and a secure index and clusteringsecure indexes and the like so that similar deposited data are includedin the same cluster.

Further, the present invention provides means for first selecting arepresentative (hereinafter called a pivot) of secure indexes in eachcluster in a secure search process, collating the pivot and a trapdoor(acquired by securing a search keyword included in a search query),ranking the priority of the collation of all registered data included inthe cluster to which the pivot belongs and the trapdoor in higher rankwhen the pivot hits the trapdoor and ranking the priority of thecollation of all the registered data included in the cluster to whichthe pivot belongs and the trapdoor in lower rank when the pivot does nothit the trapdoor. The acceleration of the secure search process isrealized by sequentially collating all registered data and the trapdoorafter the priority of the objects to be collated is determined anddiscontinuing the collation at fixed times, inhibiting the deteriorationof security and search precision.

Advantageous Effects of Invention

In a secure search system that searches without decrypting encrypteddata deposited in a database server, a secure search can be acceleratedby clustering secure indexes using a characteristic quantity theoriginal data of which is difficult to guess, inhibiting thedeterioration of security and search precision.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 shows an outline of a secure search system in an embodiment ofthe present invention.

FIG. 2 shows the schematic configuration of a registered client in theembodiment of the present invention.

FIG. 3 shows the schematic configuration of a search client in theembodiment of the present invention.

FIG. 4 shows the schematic configuration of a database server in theembodiment of the present invention.

FIG. 5 shows a sequence of a data registration process between theregistered client and the database server in the embodiment of thepresent invention.

FIG. 6 shows the data configuration of a registered data storagelocation management table and a cluster management table respectivelyprepared by the database server in the embodiment of the presentinvention.

FIG. 7 shows a sequence of a secure search process between the searchclient and the database server in the embodiment of the presentinvention.

FIG. 8 is a flowchart showing a procedure for calculating priority bythe database server in the embodiment of the present invention.

FIG. 9 is a flowchart showing a procedure for collating a secure indexand a trapdoor by the database server in the embodiment of the presentinvention.

FIG. 10 is a flowchart showing a procedure for collating the secureindex and the trapdoor by the database server in the embodiment of thepresent invention.

FIG. 11 shows a setting screen in the search client or the databaseserver in the embodiment of the present invention.

FIG. 12 shows an outline of processing in the present invention.

FIG. 13 shows a procedure for generating the secure index.

FIG. 14 shows a procedure for collating the secure index and thetrapdoor.

FIG. 15 shows a procedure for generating a characteristic quantityvector based upon deposited data.

DESCRIPTION OF EMBODIMENTS

Referring to the drawings, an embodiment of the present invention willbe described in detail below.

(System Configuration)

FIG. 1 is a schematic diagram showing a secure search system equivalentto an embodiment of the present invention. As shown in FIG. 1, thesearch system is provided with “n” registered clients 20-1 to 20-n, “m”search clients 30-1 to 30-m and a database server 40, and is designed sothat these can mutually transmit and receive information via a network10. In this case, “n” and “m” are 1 or a larger integer, and “n” and “m”may also be different. Registered clients 20-1 to 20-n have the sameconfiguration. An arbitrary one of them will be called a registeredclient 20 below. Similarly, search clients 30-1 to 30-m have the sameconfiguration. An arbitrary one of them will be called a search client30 below.

The registered client 20 in this embodiment functions as atransmitter-receiver for registering data that transmits encrypteddeposited data and the like to the database server 40. The search client30 functions as a transmitter-receiver for a search that transmits asecure search query to the database server 40 and receives a result ofthe search. The database server 40 functions as a secure search devicethat registers the encrypted deposited data and the like in a databaseand searches data in the database.

(Registered Client)

FIG. 2 shows the schematic configuration of the registered client 20. Asshown in FIG. 2, the registered client 20 is provided with CPU 212, amemory 214, a storage device 216, a key generator 218, a registrationunit 220, a user interface 230 and a communication interface 232, and isdesigned so that these can mutually transmit and receive information viaan internal bus 200. The registration unit 220 is provided with anencryption device 222, a secure index generator 224, a characteristicquantity calculator 226 and a setting device 228. These devices canindependently transmit and receive information to/from the CPU 212 andthe like via a hub 200.

First, general components will be described. The CPU 212 is a centralprocessing unit that calculates various numeric values, processesinformation and controls equipment. The memory 214 is a semiconductorstorage device such as RAM and ROM from/to which the CPU 212 candirectly read and write information. For the storage device 216, a harddisk, a magnetic tape, a flash memory and the like which respectivelystore data and a program in a computer can be given. The correspondingdevice stores data deposited in the database server 40.

The key generator 218 generates a key and the like for encrypting ordecrypting data and further, executes processing for sharing thedecrypting key with the search client 30. The sharing of the decryptingkey will be described using FIG. 7 later.

For the user interface 230, a display, a mouse, a keyboard and the likewhich respectively output a result of processing to a user andrespectively reflect it in each component of the registered client 20 atthe request of the user can be given. The communication interface 232controls the transmission and the reception of data between eachcomponent of the registered client 20 and an external device such as thesearch client 30 and the database server 40.

Components proper to the present invention are the registration unit 230and a group which configures the registration unit and includes theencryption device 222, the secure index generator 224, thecharacteristic quantity calculator 226 and the setting device 228. Ofthese, the most characteristic component that is not included inconventional type searchable code technique is the characteristicquantity calculator 226. First, the components of the registration unit220 will be described.

The encryption device 222 reads data deposited in the database server 40from the storage device 216, encrypts the data using an encryption keygenerated in the key generator 218, notifies the registration unit 220of the encrypted data or temporarily outputs the encrypted data to thememory 214 or the storage device 216.

The secure index generator 204 reads data deposited in the databaseserver 40 from the storage device 216, generates a secure index basedupon contents of the deposited data according to algorithm proper to asearchable code, notifies the registration unit 220 of the generatedsecure index or temporarily outputs the generated secure index to thememory 214 or the storage device 216. A concrete procedure forgenerating the secure index will be described using FIG. 5 later.

The characteristic quantity calculator 206 reads data deposited in thedatabase server 40 from the storage device 216, calculates acharacteristic quantity of the deposited data according to predeterminedalgorithm, notifies the registration unit 220 of the calculatedcharacteristic quantity or temporarily outputs the calculatedcharacteristic quantity to the memory 214 or the storage device 216. Aconcrete procedure for calculating the characteristic quantity will bedescribed using FIG. 5 later.

The setting device 228 sets a parameter required for processing such asencryption, the generation of a secure index and the calculation ofcharacteristic quantity. The corresponding parameter is set by a uservia the user interface 230 and is reflected in the registration unit220, the encryption device 222, the secure index generator 224 and thecharacteristic quantity calculator 226.

The registration unit 220 receives an instruction for registration inthe database server 40 from a user via the user interface 230, controlsthe encryption device 222, the secure index generator 224 and thecharacteristic quantity calculator 226 for instructed data stored in thestorage device 216, calculates the encrypted data, a secure index and acharacteristic quantity respectively included in a set of them, andtransmits the set including the calculated respective ones to thedatabase server 40 via the communication interface 232. The details ofthe registration of data will be described using FIG. 5 later.

As for the key generator 218, the registration unit 220 and the group ofthe encryption device 222, the secure index generator 224, thecharacteristic quantity calculator 226 and the setting device 228 whichrespectively configure the registration unit, the respective devices mayalso independently execute processing and the CPU 212 may also executeprocessing by providing only a program to each device and reading thecorresponding program in the memory 214.

(Search Client)

FIG. 3 shows the schematic configuration of the search client 30. Asshown in FIG. 3, the search client 30 is provided with a CPU 312, amemory 314, a storage device 316, a search unit 320, a user interface330 and a communication interface 332, and is designed so that these canmutually transmit and receive information via an internal bus 300. Inaddition, the search unit 320 is provided with a trapdoor generator 322,a key sharer 324, a decryption device 326 and a setting device 328.These devices can independently transmit and receive information to/fromthe CPU 312 and the like via the internal bus 300 mutually. A trapdoormeans information for a search and is acquired by securing a searchkeyword included in a search query.

The CPU 312, the memory 314, the storage device 316, the user interface330 and the communication interface 332 which are respectively a generalcomponent have the similar function to the function described inrelation to FIG. 2. For components proper to the present invention, thesearch unit 320 and a group of the trapdoor generator 322, the keysharer 324, the decryption device 326 and the setting device 328 whichrespectively configure the search unit can be given. The description ofthe general components such as the CPU 312 is omitted and the componentsof the search unit 320 will be described below.

The trapdoor generator 322 receives a search query from a user via theuser interface 330, generates a trapdoor by securing a keyword includedin the search query according to algorithm proper to a searchable code,notifies the search unit 320 of the generated trapdoor or outputs thegenerated trapdoor to the memory 314 or the storage device 316. Aconcrete procedure for generating the trapdoor will be described usingFIG. 7 later.

The key sharer 324 is a device for sharing a decrypting key of thecorresponding encrypted data with the registered client 20 when the keysharer receives the encrypted data that hits the search query from thedatabase server 40. The shared decrypting key is temporarily stored inthe search unit 320 or in the memory 314 or in the storage device 316. Aconcrete process for sharing the key will be described using FIG. 7later.

The decryption device 326 decrypts the encrypted data received from thedatabase server 40 using the decrypting key acquired by the key sharer324, notifies the search unit 320 of it or temporarily outputs it to thememory 314 or the storage device 316.

The setting device 328 sets a parameter required for processing such asthe generation of a trapdoor, the sharing of a key and decryption. Thecorresponding parameter is set by a user via the user interface 330 andis reflected in the search unit 320, the trapdoor generator 322, the keysharer 324 and the decryption device 326. One example of the setting ofa parameter will be described using FIG. 11 later.

The search unit 320 receives a search query from a user via the userinterface 330, controls the trapdoor generator 322, generates a trapdoorbased upon the search query, transmits the generated trapdoor to thedatabase server 40, decrypts encrypted data returned from the databaseserver 40 under control over the key sharer 324 and the decryptiondevice 326, outputs the decrypted data to the memory 314 or the storagedevice 316 or provides it to the user via the user interface 330. Thedetails of the process will be described using FIG. 7 later.

As for the search unit 320 and the group of the trapdoor generator 322,the key sharer 324, the decryption device 326 and the setting device 328which respectively configure the search unit 320, the respective devicesmay also independently execute processing and the CPU 312 may alsoexecute processing by providing only a program to each device andreading the corresponding program in the memory 314.

(Database Server)

FIG. 4 shows the schematic configuration of the database server 40. Asshown in FIG. 4, the database server 40 is provided with a CPU 412, amemory 414, a storage device 416, an authentication unit 418, aregistration unit 420, a clustering unit 430, a search unit 440, asetting unit 450, a user interface 460 and a communication interface462, and is designed so that these can mutually transmit and receiveinformation via an internal bus 400. In addition, the clustering unit430 is provided with a similarity calculator 432. The similaritycalculator 432 can independently transmit and receive informationto/from the CPU 412 via the internal bus 400 mutually. Moreover, thesearch unit 440 is provided with a priority calculator 442 and acollator 444. These devices can also independently transmit and receiveinformation to/from the CPU 412 via the internal bus 400 mutually.

The description of the CPU 412, the memory 414, the storage device 416,the user interface 460 and the communication interface 462 which arerespectively general components is omitted because they have a similarfunction to the function described in relation to FIG. 2.

The authentication unit 418 manages ID and a password of a user basedupon which the registration and a search of data in the database server40 are allowed. The details will be described using FIG. 5 later.

For components proper to the present invention, the registration unit420, the clustering unit 430, the similarity calculator 432 whichconfigures the clustering unit, the search unit 440, the prioritycalculator 442 and the collator 444 which respectively configure thesearch unit, and the setting unit 450 can be given.

The registration unit 420 registers a set of encrypted data, a secureindex and a characteristic quantity in the storage device 416 when theregistration unit receives the set from the registered client 20.Hereinafter, this set is called registered data. The concrete contentsof registration will be described using FIG. 6 later.

The clustering unit 430 clusters registered data registered in thestorage device 416 and temporarily outputs a result of clustering to thememory 414 or the storage device 416. For clustering, similarity betweenregistered data is required to be calculated. However, the similaritycalculator 432 performs this calculation using characteristic quantityin the registered data. Concrete processing for clustering will bedescribed using FIG. 5 later. In addition, FIG. 6 takes up a concreteexample of a result of clustering.

The similarity calculator 432 calculates the similarity of tworegistered data pieces according to a request from the clustering unit430. The clustering unit 430 temporarily stores registered data piecesthe similarity of which is calculated in the memory 414 or the storagedevice 416, or directly notifies the similarity calculator 432. Thecalculated similarity is temporarily output to the memory 414 or thestorage device 416, or is directly returned to the clustering unit 430.A concrete procedure for calculating similarity will be described usingFIG. 5 later.

The search unit 440 receives a trapdoor of a search query from thesearch client 30 and returns encrypted data that hits the search queryto the search client 30 via the communication interface 442. When thesearch unit 440 receives the trapdoor, it first activates the prioritycalculator 442.

The priority calculator 442 calculates the priority of collation basedupon the result of clustering by the processing of the clustering unit430. The priority calculator compares a pivot which is a secure indexthat represents secure indexes of each cluster and the trapdoor anddetermines priority in collating each cluster. The calculated priorityis temporarily output to the memory 414 or the storage device 416 or isdirectly returned to the search unit 430. A concrete procedure forcalculating priority will be described using FIG. 8 later. The pivot ofeach cluster is set/determined in the clustering unit 430 or thepriority calculator 442.

Next, the search unit 440 calls the collator 444 and instructs thecollator to collate the trapdoor and the secure index in order in whichpriority is higher according to the priority stored in the memory 414 orthe storage device 416 or the search unit 430. The search unit 440temporarily stores the trapdoor and the secure index respectively to becollated in the memory 414 or the storage device 416 or directly passesthem to the collator 444. The collator 444 collates the trapdoor and thesecure index according to algorithm proper to a searchable code andtemporarily outputs a result of the collation to the memory 414 or thestorage device 416, or directly passes the result of the collation tothe search unit 440. A concrete procedure for collation will bedescribed using FIG. 9 later.

In the related art, all registered data is required to be collated withthe trapdoor in round-robin fashion. However, the great reduction ofsearch response time is realized because the priority calculator 442sets priority in collating registered data using characteristicquantity, the search unit 440 collates the trapdoor and the secure indexin the order in which the priority is higher and the collation isdiscontinued at a fixed frequency. Further, as it is difficult toestimate original contents of deposited data based upon a characteristicquantity, the deterioration of security can be inhibited. A concreteprocedure for a search will be described using FIGS. 7 to 10 later.

The setting unit 450 sets a parameter required for processing such asclustering and a search. The parameter is set by a database servermanager via the user interface 460 and is reflected in the registrationunit 420, the clustering unit 430, the similarity calculator 432, thesearch unit 440, the priority calculator 442 and the collator 444.

As for the authentication unit 418, the registration unit 420, theclustering unit 430, the similarity calculator 432, the search unit 440,the priority calculator 442, the collator 444 and the setting unit 450,the respective devices may also independently execute processing and theCPU 412 may also execute processing by providing only a program to eachdevice and reading the corresponding program in the memory 414.

(Summary of Processing)

First, an outline of processing by the secure search method according tothe present invention will be described using FIG. 12.

(1) The registered client registers not only encrypted data acquired byencrypting deposited data and a secure index acquired by securing anindex extracted from the deposited data but the characteristic quantityof the deposited data in the database server (1201). In this case, thecharacteristic quantity is acquired by greatly reducing data lengthwithout possibly impairing a characteristic of the deposited data andfor an example, a characteristic vector calculated based upon a word andthe like in the deposited data and quantity called a fuzzy hash acquiredby dividing the deposited data, calculating a hash value and connectingthe divided parts can be given.

(2) The database server calculates the similarity of the deposited datacorresponding to the characteristic quantity using the receivedcharacteristic quantity and clusters the secure index and the like sothat similar deposited data is included in the same cluster (1202).

(3) The database server first selects a representative (a pivot) ofsecure indexes in each cluster in a secure search process and determinesthe priority of the cluster in collating registered data by collatingthe pivot and a trapdoor acquired in securing a search keyword includedin a search query in the search client (1203).

(4) The database server collates all registered data in the unit of thecluster based upon the priority (1204) and outputs a result of thesearch to the search client.

In determining the priority, when the pivot hits the trapdoor, thepriority of the collation with the trapdoor of all the registered dataincluded in the cluster to which the pivot belongs is raised and whenthe pivot does not hit the trapdoor, the priority of the collation ofall the registered data included in the cluster to which the pivotbelongs is lowered. Further, to accelerate the secure search process,when all the registered data is sequentially collated after the priorityof objects to be collated is determined, collation is discontinued atthe fixed frequency.

(Data Registration Process)

FIG. 5 shows a sequence of a data registration process between theregistered client 20 and the database server 40. Referring to FIG. 5,the data registration process will be described and concrete contents ofthe procedure for generating the secure index and the procedure forcalculating a characteristic quantity respectively described in relationto FIG. 2 and the clustering process and the procedure for calculatingsimilarity respectively described in relation to FIG. 4 will also bedescribed below.

The data registration in the database server 40 by the registered client20 roughly includes a data generation process S50 in which theregistered client 20 generates registered data, a datatransmission/reception process S52 in which the registered client 20 andthe database server 40 transmit/receive data and a clustering processS54 in which the database server 40 clusters the registered data.

The data generation process S50 proceeds in the following procedure.

(S500) A user of the registered client 20 designates data to beregistered in the database server 40 via the user interface 230. Theregistration unit 220 that receives the designation first activates thekey generator 218. The key generator 218 generates a pair of anencryption key and a decryption key and stores the pair in the memory214 or the storage device 216. The encryption device 222 applies anencryption process to the data (the deposited data) designated to beregistered using the encryption key generated by the key generator 218.Since the generated decryption key may be transmitted to an externaldevice at the request of the search client 30 later, the storage device216 or the key generator 218 itself holds it.

(S502) The secure index generator 224 generates a secure index basedupon contents of the deposited data. A concrete generation method basedupon Non-patent Literature 1 is as follows. FIG. 13 shows a procedurefor generating the secure index.

(S502-1) Words (w1, w2, - - - ) to be collated are extracted from thedeposited data. The extraction of words is performed by extracting acharacter string delimited by a blank in English. In the case ofJapanese, words can be extracted by a method (N-gram) of decomposing asentence in the deposited data into character strings of fixed lengthand morphological analysis.

(S502-2) Respective hash values (h1, h2, - - - ) of the extracted words(w1, w2, - - - ) are calculated. Bit length of each hash value shall ben.

(S502-3) A random number sequence ri (i=1, 2, of c bits is generated foreach hash value hi. A message digest di of (n-c) bits is acquired byperforming predetermined operation for each random number sequence ri.The predetermined operation for calculating the digest depends uponanother hash function that is different from the case that the hashvalue is calculated based upon the word as described above for example.

(S502-4) The message digest di is connected to an end of the randomnumber sequence ri and a bit string si (i=1, 2, - - - ) of the length ofn is acquired. The exclusive-OR of each hash value hi and the bit stringsi becomes a secure index Hi.

Next, the representation of the secure index Hi acquired in the stepsS502-1 to 4 will be described referring to FIG. 13. When theexclusive-OR is expressed by “XOR” and the connection of two bit stringsis expressed by “|”, the secure index Hi is represented as Hi=hiXOR(ri|di) using hi (the hash value of the word wi) and (ri|di) (the bitstring si configured by the random number sequence ri and the messagedigest di). However, a hash value hi of a word Wi, a random numbersequence ri and a message digest di are respectively generated by a hashfunction h (Wi)=hi, the generation of a random number R (hi)=ri andpredetermined operation f (ri)=di.

Since not a word itself but a hash value is used and the exclusive-OR ofthe hash value and a random number sequence is calculated, it isdifficult to acquire words in the deposited data based upon a secureindex. For the details, refer to Non-patent Literature 1. A method ofcollating the secure index acquired as described above and a trapdoorwill be clarified in the following description in relation to FIG. 8

(S504) The characteristic quantity calculator 226 calculates acharacteristic quantity based upon contents of the deposited data. Forthe characteristic quantity, a method of using attribute informationsuch as the size of the deposited data which is difficult to arbitrarilychange and which has a continuous value can be first given. When thesize of deposited data is a characteristic quantity, the similarity oftwo deposited data pieces can be made to approximate to 1/(1+|s1−s2|) ina case that the size of the deposited data is s1, s2. The similarity hasvalues of 0 to 1 and the more similar deposited data pieces are, thecloser to 1 the similarity is.

For more refined characteristic quantity, a method of generating avector (a characteristic vector) based upon a word in deposited data isknown. The characteristic vector is acquired according to the followingprocedure. FIG. 15 shows a procedure for generating a characteristicvector.

(S504-1) Words (w1, w2, - - - ) which are an object to be collated areextracted from the deposited data.

(S504-2) Respective hash values (h1, h2, - - - ) of the extracted words(w1, w2, - - - ) are calculated. Bit length of each hash value shall ben.

(S504-3) Each bit of h1, h2, - - - is ORed. This is regarded as ann-dimensional vector and is called a characteristic vector.

The similarity of the two deposited data pieces can be made toapproximate to the number of bits in which they are both 1 in thecharacteristic vector using the characteristic vector calculated asdescribed above. That is, the number of bits 1 included in a result ofANDing respective bits of hash values corresponding to the two depositeddata pieces is the similarity. For the details of the characteristicvector, refer to Non-patent Literature 5.

In addition, similarity can be calculated using a quantity called afuzzy hash as a calculable characteristic quantity without extractingwords. The fuzzy hash is calculated according to the followingprocedure.

(S504-a) The deposited data is divided. A method of dividing thedeposited data so that a specific bit string is a boundary is known inaddition to a method of dividing at fixed length.

(S504-b) A hash value (h1, h2, - - - ) of each divided data piece (d1,d2, - - - ) is calculated.

(S504-c) An array of the hash values (h1, h2, - - - ) is output as afuzzy hash.

When the fuzzy hash H=(h1, h2, - - - ) corresponding to thecorresponding deposited data piece and a fuzzy hash F=(f1, f2, - - - )corresponding to another deposited data piece are used, the similarityof the two deposited data pieces can be made to approximate to the ration/(N−n) of the number of elements (the number “n” of hash valuesincluded in both H and F) in a set of the product of (h1, h2, - - - )and (f1, f2, - - - ) to the number of elements (a number acquired bysubtracting “n” from the sum N of the is number of the elements in bothH and F) in a set of the sum of (h1, h2, - - - ) and (f1, f2, - - - ).When the number of elements in H and F is m1 and m2, N=m1+m2,0=<n=<min(m1, m2)<N.

Various fuzzy hash techniques have been proposed. For the details, referto Non-patent Literature 6.

Next, the data transmission/reception process S52 proceeds according tothe following procedure.

(S520) A channel for the transmission/reception of data between theregistered client 20 and the database server 40 is established.Specifically, the authentication unit 418 first authenticates a userbased upon ID, a password and the like of the user of the registeredclient 20 via the communication interfaces 232, 462 and the userinterface 230. When the authentication unit judges that the user is anormal user registered beforehand, the authentication unit 418establishes the channel between the registration unit 220 of theregistered client 20 and the registration unit 420 of the databaseserver 40. At this time, the authentication unit 418 also collectsidentification information such as an IP address of the registeredclient 20 and stores it in the storage device 416. This identificationinformation is required in a case that a key is shared between thesearch client 30 and the registered client 20. The details will bedescribed using FIGS. 6 and 7 later. When the authentication unit judgesthat the user is not a normal user, it establishes no channel andterminates the processing.

(S522) The registration unit 220 transmits registered data (a set ofencrypted data, a secure index and characteristic quantity) to thedatabase server 40 via the communication interface 232.

(S524) The registration unit 420 receives the registered datatransmitted via the communication unit 462 and stores it in the storagedevice 416. Concrete contents of registration will be described usingFIG. 6 later.

(S526) The registration unit 420 notifies the registration unit 220 ofthe registered client 20 of the completion of registration via thecommunication interface 462.

(S528) The authentication unit 418 releases the channel establishedbetween the registration unit 220 of the registered client 20 and theregistration unit 420 of the database server 40.

According to the above procedure, the user of the registered client 20can deposit the user's own data in the database server 40 without makingthe contents of the data known to the manager of the database server 40and a third party on the channel.

The database server 40 performs the clustering S54 of the registereddata after the registration of the data. For a representative clusteringmethod, K-means clustering and hierarchical clustering are known. First,the K-means clustering is performed according to the followingprocedure. The number of characteristic quantities shall be N.

(S54-1) The centers of “K” clusters are set at random. Or when pluralsecure indexes included in each cluster are arranged in predeterminedorder, the secure index located in the center of the order is set as thecenter of the corresponding cluster.

(S54-2) The similarity of each characteristic quantity xi (i=1, 2, - - -, N) and “K” centers is calculated and the most similar center isacquired. The characteristic quantity/quantities xi is/are allocated tothe cluster to which the corresponding center belongs.

(S54-3) When the allocation of all the characteristic quantities to thecluster is unchanged, the process is finished. In other case, theprocess is returned to the step S54-2 after the center of each clusteris recalculated based upon the allocated characteristic quantity.

A result depends upon random setting of the first cluster. However,since the quantity of calculation is in the order of “nK”, there is amerit that calculation is performed by relatively fast operation. Forthe details, refer to Non-patent Literature 7.

In the K-means clustering, the center of a cluster is required to becalculated based upon characteristic quantities which belong to the samecluster. When characteristic quantities xk (k=1, - - - , m) which arerespective sizes of “m” pieces of deposited data included in a certaincluster belong to the certain cluster in a case that the size of thedeposited data is used for characteristic quantity, its center v isacquired by calculating (x1+ - - - -+xm)/m.

When an “ith” element of a center vector “v” is “vi” and an “ith”element of a characteristic vector “xk” configured by “n” characteristicquantities of deposited data “k” included in a cluster including the “m”pieces of deposited data is “xk,i” (i=1, - - - , n) in a case that acharacteristic vector is used for characteristic quantity, the ratio(ui/<xi>) of the standard deviation “ui” and a data of “xk,i” is((ui/<xi>)=<(1/C), that is, (<xi>/ui)>=C) if the ratio is smaller than1/C (C: constant of 2 to approximate 10), since a lot of “xk, i”concentrate in the vicinity of the mean value <xi>, it means that “ith”characteristic quantity of “n” characteristic quantities is effective ascharacteristic quantity of the cluster, in that case, vi=1, in othercase (the “ith” characteristic quantity is not effective as thecharacteristic quantity of the cluster, vi=0, and the center vector “v”(v1, - - - , vn) having “vi” which is 1 or 0 as an element can beacquired. That is, the center vector “v” is a characteristic quantityvector showing which characteristic quantity of “n” characteristicquantities is effective as a characteristic quantity of the cluster.

In addition, when the above-mentioned <xi> is positive, the element “vi”(1 or 0) which is acquired by the above-mentioned discriminant of thecenter vector is expressed as [pi]−[|pi−1|] using Gauss' notation.However, pi=(<xi>/(ui·C)).

In the meantime, as the calculation of similarity is special when afuzzy hash is used for a characteristic quantity, it is difficult toacquire the center based upon the characteristic quantity.

If only similarity can be calculated, hierarchical clustering can begiven even if the characteristic quantity is any characteristic quantityfor a method in which clustering is possible. This is performedaccording to the following procedure.

(S54-a) N clusters including only one characteristic quantity aregenerated.

(S54-b) Distance between the following clusters is calculated based upondistance (dissimilarity) between respective characteristic quantities xiand xj of clusters i and j and the closest two clusters are sequentiallymerged into one cluster.

(S54-c) This merger is repeated until all objects are merged into onecluster.

Output by hierarchical clustering has a tree structure called adendrogram. The dendrogram tells not only the belongings of data to acluster but distance between data pieces in the cluster. It is knownthat the quantity of calculation can be inhibited up to the square of Nby devising the merger of a cluster. For the details, refer toNon-patent Literature 8.

As a summary, according to the K-means clustering, fast operation isacquired. However, it can be applied to only a case that the center canbe determined based upon a characteristic quantity. According to thehierarchical clustering, operation is slower than the operation in theK-means clustering. However, any data can be clustered if onlysimilarity can be calculated.

FIG. 6 shows respective data configuration of a registered data storagelocation management table 60 and a cluster management table 62respectively prepared by the database server. The registration unit 420of the database server 40 prepares the registered data storage locationmanagement table 60 and stores it in the storage device 416 when theregistration unit receives a set (registered data) of encrypted data, asecure index and characteristic quantity from the registered client 20.

The registered data storage location management table 60 is providedwith a registered ID column 600 that stores registered data ID foruniquely identifying registered data, an encrypted data column 602 thatrecords a storage location in the storage device 416 of the receivedencrypted data, a secure index column 604 that records a storagelocation in the storage device 416 of the secure index, a characteristicquantity column 606 that records a storage location in the storagedevice 416 of the characteristic quantity, a registered client column608 that stores identification information of the registered client 20which registers encrypted data and the like and a column 608 that storesthe other required items.

The registration unit 420 issues registered data ID so that registereddata can be uniquely identified by increasing a value by 1 every timeregistered data is added.

For information to be recorded in the encrypted data column 602, thesecure index column 604 and the characteristic quantity column 606, afile name and a sector address in the storage device 416 respectively ofencrypted data and the like can be given. Since the data volume ofcharacteristic quantity is less than that of encrypted data, thecharacteristic quantity may also be directly stored in thecharacteristic quantity column 606.

For information to be stored in the registered client column 608, an IPaddress of the registered client 20 that registers encrypted data andthe like can be given. The authentication unit 418 acquires thisinformation in the step S520 shown in FIG. 5, and this information isrequired in a case that a key is shared between the search client 30 andthe registered client 20. The details of a key sharing process will bedescribed using FIG. 7 later.

Further, for information to be stored in the column 608, a date on whichdata is registered can be given.

The clustering unit 430 records a result of clustering in the clustermanagement table 62 using a characteristic quantity and stores it in thestorage device 416. The cluster management table 62 includes a clusterID column 620 that stores a cluster ID for uniquely identifying acluster, a registered data ID column 622 that stores the registered dataID 600 of registered data which belongs to the cluster and a column 624that stores the other required items. For information to be stored inthe column 624, information related to the center of a cluster in theK-means clustering can be given.

(Details of Secure Search Process)

The characteristic quantity calculation method, the clustering method,the units and the devices for realizing these have been described. Thesemethods and the units/devices are equivalent to so-called priorpreparation required for accelerating a secure search. The details ofthe secure search process executed by the database server will bedescribed below.

(Search Process by Search Client and Database Server)

FIG. 7 shows a sequence of the search process by the search client andthe database server. The data registration process will be describedreferring to FIG. 7 below, and respective concrete processing of the keysharing procedure described in relation to FIGS. 2, 3 and 6 and thetrapdoor generating procedure described in relation to FIG. 3 will bedescribed below.

The secure search process executed by the search client 30 and thedatabase server 40 roughly includes a trapdoor generation step S70 inwhich the search client 20 generates a trapdoor based upon a searchquery, a secure search step S72 for searching between the search client20 and the database server 40 and a decryption step S74 for sharing adecryption key between the registered client 20 and the search client 30and decrypting an encrypted data.

The search unit 320 of the search client 20 receives the search queryfrom a user via the user interface 330 and generates a trapdoor basedupon the search query under control over the trapdoor generator 322 inthe trapdoor generation step S70. A concrete example of the trapdoorgeneration step S70 based upon Non-patent Literature 1 is as follow.

(S70-1) The hash function used in a secure index generation step(S502-2) is prepared.

(S70-2) A hash value of a search query (a search keyword) is calculatedusing the hash function (securing the search keyword). This turns atrapdoor.

Since the hash value of the search query is used, it is difficult tospecify the search query based upon the trapdoor. A method of collatingthe trapdoor and the secure index respectively acquired as describedabove will be clarified in description in relation to FIG. 9 later.

The secure search step S72 proceeds according to the followingprocedure.

(S720) A channel for transmitting and receiving data is establishedbetween the search client 30 and the database server 40. Specifically,the authentication unit 418 first authenticates a user based upon ID anda password of the user of the search client 30 via the communicationinterfaces 332, 462 and the user interface 330. When the authenticationunit judges that the user is a normal user registered beforehand, theauthentication unit 418 establishes a channel between the search unit320 of the search client 30 and the search unit 440 of the databaseserver 40. When the authentication unit judges that the user is notnormal user, it establishes no channel and finishes the process.

(S722) The search unit 320 transmits a trapdoor to the database server40 via the communication interface 332. The search unit 440 of thedatabase server 40 receives the trapdoor via the communication interface462.

(S724) The priority calculator 442 of the search unit 440 calculates thepriority of collation by collating the received trapdoor and a part of asecure index stored in the storage device 416. A concrete prioritycalculation procedure will be described using FIG. 8.

(S726) The search unit 440 controls the collator 444 based upon thepriority acquired in the step S724 so as to collate the trapdoor and thesecure index in higher order of the priority. A concrete collationprocedure will be described using FIG. 9 later.

(S728) The search unit 440 returns hit encrypted data to the search unit320. In addition, the search unit 440 also returns an IP address of theregistered client 20 stored in the registered client column 608 of theregistered data storage location management table 60. The reason is thatthey are required for the search client 30 to acquire a decryption keyin the following decryption step S74.

(S730) The authentication unit 418 releases the channel establishedbetween the search unit 320 and the search unit 440.

Next, the decryption step S74 will be described. To decrypt theencrypted data acquired as a result of the search, the decryption key isrequired to be shared with the registered client 20. For a method ofsharing the key, a method of sharing a key using a public keycryptosystem utilized in an SSL (secure sockets layer) and a method ofsharing DH (Diffie-Hellman) key utilized in IPSec (security architecturefor Internet protocol) are known. A concrete procedure of the method ofsharing the key using the public key cryptosystem will be describedbelow.

(S740) The key sharer 324 extracts identification information such as anIP address of the registered client 20 that registers encrypted data andthe like from the result of the search received from the database server40. The registered client 20 owns a decryption key of the encrypteddata. Before sharing the decryption key, it is first required to becertified that the search client 30 is a normal client that does notspoof. The key sharer 324 authenticates the registered client 20according to the following procedure.

(S740-1) The key sharer 324 connects with the registered client 20 basedupon an IP address of the registered client 20 via the communicationinterface 332.

(S740-2) The key generator 218 of the registered client 20 requests acertificate for the key sharer 324 of the search client 30 via thecommunication interface 232. In this case, the certificate is acertificate in which a reliable third party (CA: Certificate Authority)applies an electronic signature to a public key of the search client 30.

(S740-3) The key sharer 324 transmits the certificate to the registeredclient 20.

(S740-4) The key generator 218 verifies the signature of the certificateand acquires the public key of the search client 30. When theverification of the signature fails, the channel is disconnected for areason that the certificate is unjust, and the process is finished.

(S740-5) The key sharer 324 generates a message, adds a message digestto it, encrypts it with a secret key which the key sharer 324 owns, andtransmits it to the key generator 218.

(S740-6) The key generator 218 decrypts the message using the public keyof the key sharer 324. The key generator creates a message digest basedupon the decrypted message and compares the message digest with themessage digest added by the key sharer 324. When the coincidence of bothmessage digests is verified, it is determined that the message which isnot falsified is received from the normal search client 30 and theauthentication is completed. If not, it is determined that the searchclient 30 is not a normal client, the channel is disconnected, and theprocess is finished.

In the authentication procedure described in (S704-1) to (S740-6), allthe normal search clients 30 to which the normal certificate is issuedby the CA can decrypt the encrypted data independent of an intention ofa user of the registered client 20. To limit a destination of thetransmission of the decryption key, a method that the key generator 218also reads identification information of the search client 30 frominformation of a connection source, a certificate and the like whenconnection is made by the search client 30 in the step S704-1 and whenthe certificate is verified in the step S740-4 and the channel isdisconnected not to transmit the decryption key to a device except thepredetermined search client has only to be taken. For a destination ofthe transmission of the decryption key, a user can specify it via theuser interface 230.

After the authentication is completed, the search client 30 acquires adecryption key from the registered client 20 according to the followingprocedure.

(S742) The key generator 218 of the registered client 20 encrypts adecryption key owned by itself with the public key acquired from thecertificate and transmits it to the key sharer 324 of the search client30 via the communication interface 232. The key sharer 324 decrypts theencrypted decryption key with its own secret key and acquires thedesired decryption key.

(S744) The decryption device 326 decrypts the encrypted data using thedecryption key acquired in S742 and the search process is completed.

According to the above-mentioned procedure, the user of the searchclient 30 can acquire a predetermined search result withoutsubstantially making the contents of the search query and the searchresult and known to the manager of the database server 40 and a thirdparty on the channel.

(Determination of Priority)

FIG. 8 is a flowchart showing a procedure for a priority calculationstep S724 executed by the database server 40. The priority calculationprocess executed after a trapdoor is received from the search client 30will be described referring to FIG. 8 below. Concrete processing for thecollation of the secure index and the trapdoor described in relation toFIG. 5 will be also described. The priority calculator 442 of the searchunit 440 of the database server 40 executes all the followingprocessing.

(S800) “1” is set as a variable p for counting clusters.

(S802) Representative data is selected out of all registered data whichbelongs to the cluster having cluster ID of p. The representative datamay also be selected out of all the registered data that belongs to thecorresponding cluster at random and when the K-means clustering is used,the registered data closest to the center of the corresponding clustermay also be representative data. For example, when registered data in acluster is arranged in predetermined order, data located in the vicinityof the center of the whole order is made to be representative data.

(S804) When the variable p is smaller than the number of all clusters, pis added by 1 (S806), the processing is returned to S802, and thesimilar processing is executed in relation to the next cluster. If not,the processing proceeds to a step S810.

(S810) “1” is set to a variable q for counting clusters.

(S812) A secure index of representative data of a cluster ID of which isq and a trapdoor are collated. An example of a concrete collation methodbased upon Non-patent Literature 1 is as follows. FIG. 14 shows aprocedure for collating the secure index and the trapdoor. The secureindex Hi acquired in (S502-1) to (S502-4) shown in FIG. 13 is assumed tohave been generated by the exclusive-OR of the hash value hi generatedbased upon the word Wi in the deposited data and the bit string acquiredby connecting the random number sequence ri with its message digest di(i=1, 2, - - - ). In addition, the trapdoor acquired according to theprocedure in (S70-1) to (S70-2) shall be h′. A subscript “i” means anidentifier for respective words included in deposited data.

(S812-1) The exclusive-OR of the secure index Hi and the trapdoor h′ isoperated for each word Wi.

(S812-2) As in the step S502-3, the same predetermined operation as thatin the step S502-3 shown in FIG. 13 is performed from a bit string r′iof leading c bits of the exclusive-OR (a bit string S′i), the messagedigest Di is calculated, and the Di is compared with a bit string d′i oflatter-half n-c bits of the exclusive-OR.

(S812-3) If hi=h′, that is, if an original word Wi and a search keywordare coincident, only the random number sequence ri and its messagedigest di are to be left by exclusive-ORing the digest Di and the bitstring d′i. Therefore, if the message digest Di of r′i is coincidentwith d′i, it can be judged that hi is equal to h′ and the deposited datacorresponding to the secure index includes the search query (the searchkeyword) corresponding to the trapdoor. Hereinafter, this phenomenon issimply called a hit in a search. When the message digest Di is notcoincident with d′i, it is judged that deposited data corresponding tothe secure index does not include the search query corresponding to thetrapdoor.

Next, algorithm for determining the coincidence of the search query andthe trapdoor in the steps S812-2, S812-3 will be described referring toFIG. 14.

When exclusive-OR is XOR, a complement (negation) of a set A is “

”, OR is “+”, AND is “·” and the connection of two bit strings is “|”,the exclusive-OR of three sets is generally (A XOR B) XOR C=X·B+

X·

B and X=

(A XOR C),

X=(A XOR C). In this case, when A=hi (the hash value of the word wi),B=(ri|di)(a bit string Si configured by the random number sequence riand the message digest di) and C=h′ (the trapdoor), X=hi XOR

h′,

X=hi XOR h′. Especially, when hi=h′, X=1,

X=0.

When the exclusive-OR of the secure index Hi and the trapdoor h′ is HiXOR h′=(r′|d′i) (a bit string S′i configured by the random numbersequence and the message digest) (S812-1) and the above-mentionedrepresentation of Hi is assigned to Hi on the left side, r′i=X·ri+

X·

ri and d′ i=X·di+

X·

di.

Since X=1,

X=0 if hi=h′ if hi=h′, r′i=ri and d′i=di. Accordingly, as the messagedigest Di based upon the trapdoor is “f (r′i)=f (ri)=di (S812-2)” andfurther, d′ i=di, Di=d′i (S812-3).

Accordingly, if hi=h′, that is, the word Wi is the search keyword,Di=d′i (S812-3).

Collation with the trapdoor is enabled by utilizing properties of therandom number sequence and the exclusive-OR even if an unciphered textand an encrypted text are not 1 to 1. For the details, refer toNon-patent Literature 1. The representative data (the registered data)in the cluster, especially the secure index in the registered data isalso called a pivot.

(S814) The priority of the cluster at which cluster ID is q iscalculated so that the larger a rate of the coincidence of the secureindex and the trapdoor is, the higher the priority is. When there isjust one trapdoor, the rate of the coincidence is provided at a binaryshowing whether a hit occurs in a search or not. Since there are pluraltrapdoors when plural search keywords are specified and an AND (logicalproduct) search of these or an OR (logical OR) search of these is made,the rate of the coincidence is provided by the ratio of the number oftrapdoors that hit in the search and all the trapdoors.

(S816) When the variable q is smaller than the number of all clusters, qis added by 1 (S818), the process is returned to the step S812, and asfor the next cluster, the similar processing is performed. If not, theprocess proceeds to a step S820.

(S820) The priority calculator 442 sorts cluster IDs so that thepriority is higher and outputs a result of the sort to the memory 414 orthe storage device 416. Thereby, the priority calculation process isfinished.

(Collation of Registered Data Based Upon Priority of Cluster)

FIG. 9 is a flowchart showing a procedure for the collation of thesecure index and the trapdoor (the step S812 shown in FIG. 8) executedby the database server 40. The embodiment of the present invention has acharacteristic that the acceleration of secure search processing isrealized while inhibiting the deterioration of security and searchprecision by sequentially collating the subsequent objects after thepriority of objects to be collated is determined using a characteristicquantity of deposited data and discontinuing the collation at the fixedfrequency. Hereinafter, the number of times in which collation isperformed is called a collated frequency. The collated frequency is setby a user of the registered client 20 via the user interface 230beforehand. The setting of the collated frequency will be describedusing FIG. 11 later.

The processing for collating the secure index and the trapdoor isperformed according to the following procedure. The following processing(except S904) is all executed by the search unit 440.

(S900) A variable t as a result of counting the collated frequency isset to 0 (zero) and a variable k as a result of counting clusters is setto 1.

(S902) A variable n as a result of counting registered data in thecluster is set to 1. The priority output by the priority calculator 442in the step S820 is read from the memory 414 or the storage device 416,and the cluster Ck having the “kth” higher priority is specified.

(S904) The collator 444 collates a secure index of the “nth” registereddata included in the cluster Ck and a trapdoor. When they are hit,corresponding registered data ID is temporarily output to the memory 414or the storage device 416 together with a rate of the coincidence of thesecure index and the trapdoor. When they are not hit, nothing is output.

(S906) The variable t showing the collated frequency is added by 1.

(S908) When the variable t is smaller than a predetermined collatedfrequency, the processing proceeds to the next step S910. If not, theprocessing proceeds to a step S918 and the processing is finished.

(S910) When the variable n as a result of counting registered data issmaller than the number of all registered data included in the clusterCk, “n” is added by 1 (S912), the processing is returned to the stepS904, and the similar processing is applied to the next registered datain the cluster. If not, the processing proceeds to a step S914.

(S914) When the variable k showing priority is smaller than the numberof all clusters, “k” is added by 1 (S916), the processing is returned tothe step S902, and the similar processing is applied to the clusterhaving the next higher priority. If not, the processing proceeds to thestep S918.

(S918) The search unit 440 outputs encrypted data corresponding to theregister data ID temporarily output to the memory 414 or the storagedevice 416 by the collator 444 to the memory 414 or the storage device416 together with the rate of the coincidence. Thereby, the collationprocessing of the secure index and the trapdoor is finished.

Suppose that the registered client 20 registers 1000 pieces of depositeddata including a keyword, “cloud” and 9000 pieces of deposited data notincluding “cloud” in the database server 40 in this embodiment. Thedatabase server 40 divides the registered deposited data into two, whichare a cluster A that includes “cloud” and a cluster B that does notinclude “cloud” by clustering processing, and manages the registereddata. When the search client 30 searches with a search query of “cloud”,the 1000 pieces of registered data in the cluster A are preferentiallycollated by collating the search query (the trapdoor) and a pivot (arepresentative of secure indexes). Therefore, even if the collation isdiscontinued at 1000 times, all the 1000 pieces of deposited dataincluding “cloud” are hit. In the meantime, in the conventional typesecure search system using a searchable code, all 10000 pieces ofregistered data are collated with a trapdoor, “cloud” in a search queryand for the first time, all deposited data including “cloud” are hit.Accordingly, can accelerate a search rate by 10 times, compared with therelated art. As described above, the secure search can be acceleratedwhile inhibiting the deterioration of security and search precision byclustering secure indexes using characteristic quantity based upon whichoriginal data is difficult to guess according to the embodiment of thepresent invention described using FIGS. 1 to 9.

FIG. 10 is a flowchart showing a procedure for collating a secure indexand a trapdoor performed by the database server 40. In the example shownin FIG. 9, the acceleration of the secure search process is realized bydiscontinuing the collation at the fixed frequency. However,acceleration can also be realized by discontinuing collation not at acollated frequency but at a frequency at which a search hits(hereinafter called a frequency of hits). Specifically, the steps S906and S908 shown in FIG. 9 are replaced with the following steps S906-a,S906-b and S908′. The following processing is all executed by the searchunit 440.

(S906-a) The collator 444 judges whether a search hits or not. When thesearch hits, the processing proceeds to S906-b. When the search does nothit, the processing proceeds to the step S908′.

(S906-b) The variable t showing the collated frequency is added by 1.

(S908′) When the variable t is smaller than a predetermined frequency ofhits, the processing proceeds to the next S910. If not, the processingproceeds to S918 and the processing is finished.

The frequency of hits is set by a user of the registered client 20 viathe user interface 230 beforehand. The setting of the frequency of hitswill be described using FIG. 11 later.

Since collation is repeated until a frequency of hits reaches the presetfrequency of hits in the method shown in the flowchart in FIG. 10, themethod has an advantage that the omission of a search is reduced,compared with the search method described in relation to FIG. 9. On theother hand, the method shown in the flowchart in FIG. 10 has ashortcoming that if no search hits, a search is delayed. In themeantime, in the method described in relation to FIG. 9, since collationis made by only the preset collated frequency, the method has ashortcoming that the omission of a search is more apt to occur than thatin the method shown in FIG. 10. However, the method described inrelation to FIG. 9 has an advantage that search response time is keptfixed regardless of a result of the search.

FIG. 11 shows a setting screen on the search client 30 or the databaseserver 40. For the parameter set by the setting device 328 of the searchclient 30 and described in relation to FIG. 3, a collated frequency canbe given.

Dialog boxes 1100, 1120 are an example of a screen which the settingdevice 328 presents to a user via the user interface 330 to let the userset a collated frequency. In the dialog box 1100, while the collatedfrequency is decreased as a slide bar 1102 is moved leftward and asearch rate is enhanced, the possibility that registered data is hit ina search decreases and search precision is deteriorated. While thecollated frequency is increased as the slide bar 1102 is moved rightwardand the search rate is reduced, the search precision is enhanced.According to a position of the slide bar 1122, the collated frequency isset to a predetermined value held by the setting device 328. Inaddition, the user can also directly set the collated frequency in aninput box 1122 in the dialog box 1120. In the embodiment described inrelation to FIG. 10, the above-mentioned collated frequency can bereplaced with a frequency of hits.

In addition, an embodiment in which the collated frequency (or thefrequency of hits) is set by the manager of the database server 40 isalso allowed. In this case, the setting unit 450 of the database server40 presents the dialog boxes 1100, 1120 via the user interface 460. Whenthe manager of the database server 40 periodically checks the number ofregistered data and a state of clustering and adjusts the collatedfrequency to prevent search response time from being delayed, thequality of secure search service can be guaranteed.

REFERENCE SIGNS LIST

-   -   10: Network, 20-1 to 20-n, 20: Registered client, 30-1 to 30-m,        30: Search client, 40: Database server, 200, 300, 400: Internal        bus, 212, 312, 412: CPU, 214, 314, 414: Memory, 216, 316, 416:        Storage device, 230, 330, 460: User interface, 232, 332, 462:        Communication interface, 218: Key generator, 220: Registration        unit, 222: Encryption device, 224: Secure index generator, 226:        Characteristic quantity calculator, 228: Setting device, 320:        Search unit, 322: Trapdoor generator, 324: Key sharer, 326:        Decryption device, 328: Setting device, 418: Authentication        unit, 420: Registration unit, 430: Clustering unit, 432:        Similarity calculator, 440: Search unit, 442: Priority        calculator, 444: Collator, 450: Setting unit, S50: Data        generation process, S52: Data transmission/reception process,        S54: Clustering process, 60: Registered data storage location        management table, 600: Registered data ID column, 602: Encrypted        data column, 604: Secure index column, 606: Characteristic        quantity column, 608: Registered client column, 610: Column        storing other required items, 62: Cluster management table, 620:        Cluster ID column, 622: Registered data ID column, 624: Column        storing other required items, S70: Trapdoor generation step,        S72: Secure search step, S74: Decryption step, 1100, 1120:        Dialog box, 1102: Slide bar, 1122: Input box

1. A registered client that transmits data to a secure search device,the registered client comprising: an encryption device that encryptsdata transmitted to the secure search device and generates its encrypteddata; a secure index generator that generates a secure index acquired bysecuring an index extracted from the data; a characteristic quantitycalculator that calculates a characteristic quantity for calculatingsimilarity between data pieces based upon the data; and a registrationunit that transmits a set of the encrypted data, the secure index andthe characteristic quantity to the secure search device.
 2. Theregistered client according to claim 1, wherein the secure indexgenerator in the registered client extracts search keywords from data;the secure index generator calculates a hash value for each of theextracted keywords; the secure index generator generates a random numbersequence for each hash value and acquires a message digest for therandom number sequence; and the secure index generator outputs theexclusive-OR of a bit string acquired by connecting the random numbersequence with the message digest and the hash value as the secure indexso as to equalize the sum of bit length of the random number sequenceand bit length of the message digest to bit length of the hash value. 3.The registered client according to claim 1, wherein the characteristicquantity is provided in the form of data length of data transmitted tothe secure search device; and similarity between two data pieces iscalculated in 1/(1+|s1−s2|) including respective characteristicquantities s1, s2 of the two data pieces.
 4. The registered clientaccording to claim 1, wherein the characteristic quantity is provided byextracting words from data transmitted to the secure search device,calculating a hash value for each of the extracted words and regardinglogical OR of the hash values as a bit string; and similarity betweentwo data pieces is calculated based upon the number of bits which areboth 1 in characteristic quantities of the two data pieces.
 5. Theregistered client according to claim 1, wherein the characteristicquantity is provided as a set having hash values as components bydividing data transmitted to the secure search device on a predeterminedspecific bit pattern functioning as a boundary and calculatingrespective hash values of the divided data; and similarity between twodata pieces is calculated based upon the ratio of the number of elementsincluded in an intersection of characteristic quantities of the two datapieces and the number of elements included in a union of thecharacteristic quantities.
 6. A search client that makes a search for asecure search device, comprising a trapdoor generator that generates atrapdoor acquired by securing a search keyword included in a searchquery for searching data registered in the secure search device.
 7. Thesearch client according to claim 6, wherein the trapdoor is acquired bycalculating a hash value for the search query using a hash function bywhich a hash value of the search keyword is acquired when a secure indexis generated based upon the search keyword extracted from the data.
 8. Asecure search device that receives data from a registered client andreceives information for a search from a search client, the securesearch device comprising: a receiver that receives, from the registeredclient, a set of encrypted data acquired by encrypting the data, asecure index acquired by securing an index extracted from the data and acharacteristic quantity for calculating similarity between data pieces;a similarity calculator that calculates the similarity of two datapieces based upon the characteristic quantity received from theregistered client; a clustering unit that clusters the encrypted datareceived from the registered client based upon the similarity calculatedby the similarity calculator; a priority calculator that receives atrapdoor acquired by securing a search keyword included in a searchquery for searching data registered in the secure search device from thesearch client and calculates the priority of collation of the clusteredencrypted data and the trapdoor based upon a result of clusteringgenerated by the clustering unit; a collator that collates the secureindex received from the registered client and the trapdoor; and a searchunit that collates the encrypted data and the trapdoor by the collatorbased upon the priority calculated by the priority calculator in theorder of clusters having higher priority by a predetermined frequencywhen the trapdoor is received from the search client, and returns theencrypted data that hits the trapdoor to the search client.
 9. Thesecure search device according to claim 8, wherein the clustering unitgenerates one or more clusters and sets the center of each cluster atrandom; the clustering unit instructs the similarity calculator tocalculate the similarity of the centers of all data received from theregistered client based upon a characteristic quantity included in eachdata piece and allocates each data piece to the clusters to which themost similar centers belong; the clustering unit finishes the processingof all the data received from the registered client when allocation tothe clusters is unchanged; and the clustering unit otherwise repeatsprocessing for acquiring the center after the center of each cluster isrecalculated using a characteristic quantity of data which belongs tothe corresponding cluster.
 10. The secure search device according toclaim 8, wherein the clustering unit generates as many clustersincluding only one of data pieces received from the registered client asthe number of the data pieces; the clustering unit instructs thesimilarity calculator to calculate distance between clusters using acharacteristic quantity of data which belongs to each cluster andsuccessively merges two clusters having the shortest distance; and theclustering unit repeats the merger until all objects are merged into onecluster.
 11. The secure search device according to claim 8, wherein thepriority calculator selects one of all data pieces which belong to eachcluster as representative data for every cluster; the collator isinstructed to collate a secure index of the representative data of eachcluster and the trapdoor; the priority calculator calculates thepriority of the cluster to be higher priority as a rate of coincidencewith the trapdoor is larger; and the priority calculator sorts the orderof the collation of data included in the cluster to be higher priority.12. The secure search device according to claim 8, wherein the collatorapplies exclusive-OR to the secure index and the trapdoor; the collatorextracts a bit string having the same length as a random number sequencegenerated in generating the secure index from the head of theexclusive-OR and calculates a message digest of the bit string; thecollator judges that the secure index includes a search querycorresponding to the trapdoor when the message digest is coincident witha bit string of which the message digest of the exclusive-OR is notcalculated; and the collator judges that the secure index does notinclude the search query corresponding to the trapdoor when the messagedigest is not coincident with the bit string.
 13. The secure searchdevice according to claim 8, wherein the search client sets a frequencyof collation performed by the search unit.
 14. The secure search deviceaccording to claim 8, wherein the secure search device sets a frequencyof collation performed by the search unit. 15-16. (canceled)